Graph kernels for chemoinformatics – a critical discussion
نویسنده
چکیده
We analyze the use, advantages, and drawbacks of graph kernels in chemoin-formatics, including a comparison of kernel-based approaches with other methodology, as well as examples of applications. Kernel-based machine learning [1], now widely applied in chemoinformatics, delivers state-of-the-art performance [2] in tasks like classification and regression. Molecular graph kernels [3] are a recent development where kernels are defined directly on the molecular structure graph. This allows the adaptation of methods from graph theory to structure graphs and their direct use with kernel learning algorithms. The main advantage of kernel learning, the so-called “kernel trick”, allows for a systematic, computationally feasible, and often globally optimal search for non-linear patterns, as well as the direct use of non-numerical inputs such as strings and graphs. A drawback is that solutions are expressed indirectly in terms of similarity to training samples, and runtimes that are typically quadratic or cubic in the number of training samples. Graph kernels [3] are positive semidefinite functions defined directly on graphs. The most important types are based on random walks, subgraph patterns, optimal assignments, and graphlets. Molecular structure graphs have strong properties that can be exploited [4], e.g., they are undirected, have no self-loops and no multiple edges, are connected (except for salts), annotated, often planar in the graph-theoretic sense, and their vertex degree is bounded by a small constant. In many applications, they are small. Many graph kernels are generalpurpose, some are suitable for structure graphs, and a few have been explicitly designed for them. We present three exemplary applications of the iterative similarity optimal assignment kernel [5], which was designed for the comparison of small structure graphs: The discovery of novel agonists of the peroxisome proliferator-activated receptor g [6] (ligand-based virtual screening), the estimation of acid dissociation constants [7] (quantitative structure-property relationships), and molecular de novo design [8].
منابع مشابه
Two New Graph Kernels and Applications to Chemoinformatics
Chemoinformatics is a well established research field concerned with the discovery of molecule’s properties through informational techniques. Computer science’s research fields mainly concerned by the chemoinformatics field are machine learning and graph theory. From this point of view, graph kernels provide a nice framework combining machine learning techniques with graph theory. Such kernels ...
متن کاملTwo new graphs kernels in chemoinformatics
Chemoinformatics is a well established research field concerned with the discovery of molecule’s properties through informational techniques. Computer science’s research fields mainly concerned by chemoinformatics are machine learning and graph theory. From this point of view, graph kernels provide a nice framework combining machine learning graph theory techniques. Such kernels prove their eff...
متن کاملTreelet kernel incorporating cyclic, stereo and inter pattern information in chemoinformatics
Chemoinformatics is a research field concerned with the study of physical or biological molecular properties through computer science’s research fields such as machine learning and graph theory. From this point of view, graph kernels provide a nice framework which allows to naturally combine machine learning and graph theory techniques. Graph kernels based on bags of patterns have proven their ...
متن کاملGraph Kernels for Chemoinformatics
In chemoinformatics and bioinformatics, it is effective to automatically predict the properties of chemical compounds and proteins with computeraided methods, since this can substantially reduce the costs of research and development by screening out unlikely compounds and proteins from the candidates for ‘wet” experiment. Data-driven predictive modeling is one of the main research topics in che...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 3 شماره
صفحات -
تاریخ انتشار 2011